DFT Meets Wave-Function Methods for Accurate Structures and Rotational Constants of Histidine, Tryptophan, and Proline

A new computational strategy has been applied to the conformational and spectroscopic properties in the gas phase of amino acids with very distinctive features, ranging from different tautomeric forms (histidine) to ring puckering (proline), and heteroaromatic structures with non-equivalent rings (tryptophan). The integration of modern double-hybrid functionals and wave-function composite methods has allowed us to obtain accurate results for a large panel of conformers with reasonable computer times. The remarkable agreement between computations and microwave experiments allows an unbiased interpretation of the latter in terms of stereoelectronic effects.


■ INTRODUCTION
Increasing attention has been paid in the last years to the conformational landscape of amino acids, which couple limited dimensions with a remarkable flexibility tuned by different kinds of non-covalent interactions. 1,2Since environmental effects can strongly modify the characteristics of amino acids (for instance, zwitterionic forms are more stable in crystals 3 and aqueous solutions, 4 whereas neutral forms are exclusively found in the gas phase 5 or in inert matrixes 6 ), an unbiased disentanglement of intrinsic stereoelectronic features requires preliminary studies in the gas phase.Thanks to the development of spectrometers coupling supersonic-jet expansion 7 and laser ablation, 8 thermolabile molecules with high melting points (like most amino acids) have become accessible to high-resolution spectroscopy studies.However, the interpretation of experimental spectra in structural and thermochemical terms is made difficult by the fast relaxation of some conformers to more stable counterparts whenever the corresponding energy barriers can be overcome under the specific experimental conditions. 9−17 Furthermore, an effective exploration of flat potential energy surfaces (PESs) requires more refined strategies 18,19 with respect to the systematic searches and/or local optimization techniques routinely employed for small semirigid molecules.−22 In this framework, after the preliminary discovery of conformers lying in a sufficiently large energy range by relatively cheap methods, the structures of the most stable conformers are refined by a double-hybrid functional and, possibly, further improved by a linear regression approach (LRA) involving a few empirical parameters in order to correct systematic errors. 23,24Next, the transition states (TSs) ruling interconversion paths between pairs of conformers are found and energy minima connected to more stable species by low energy barriers are removed from the conformer list.Improved relative energies of the surviving conformers are evaluated by single-point computations with a wave-function composite method 25−30 and used, together with zero-point energies (ZPEs), computed by the double-hybrid functional mentioned above, to determine the final relative populations.Finally, spectroscopic parameters of the energy minima with non-negligible populations are computed. 16,31−42 This diversified landscape can be further enriched by a comprehensive analysis of amino acids showing additional features, like tautomerism (histidine, His), ring puckering (proline, Pro), or heteroaromatic structures with nonequivalent rings (tryptophan, Trp).Since all of these amino acids have been investigated in the gas phase by microwave (MW) spectroscopy, 38,43−45 the QC results must match those accurate experimental data.In this framework, the most distinctive feature of the present analysis with respect to previous studies is the coupling of feasibility and accuracy (relative mean unsigned errors (RMUEs) within 0.3% for rotational constants and 1% for quadrupolar coupling constants and relative energies), allowing the a priori prediction of experimental outcomes without any ad hoc assumption.

■ METHODS
As already mentioned in the Introduction, a preliminary exploration of the conformational PES by a fast semiempirical method 46 guided by ML algorithms 22,47 is followed by a characterization of low-energy conformers at the B3LYP-D3BJ/6-31+G* level. 48,49The same combination of functional and basis set (hereafter B3/SVP) will be used also for the computation of anharmonic contributions (vide infra).The geometries of conformers lying within 1500 cm −1 above the absolute energy minimum are refined by the revDSD-PBEP86-D3BJ double-hybrid functional 50−53 (hereafter rDSD) in conjunction with the jun-cc-pVTZ basis set 52 (hereafter j3).While the rDSD/j3 model provides excellent conformational landscapes 15,54,55 and geometrical parameters, 24 further refinements are needed for structures lying below 1000 cm −1 and not connected to more stable energy minima by barriers lower than 400 cm −1 . 56,57In fact, the main outcomes of experimental MW spectra are the ground-state rotational constants (B τ 0 , where τ refers to the inertial axes a, b, c), which include, together with equilibrium rotational constants (B τ eq ), also electronic contributions (neglected in the following due to their very small values) and vibrational corrections (ΔB τ vib ). 16,58,59The leading (and most expensive) contribution to vibrational corrections comes from cubic force constants. 60ortunately, these terms can be obtained at affordable levels of theory (B3/SVP in the present context) since errors of the order of 10% on vibrational corrections correspond to errors lower than 0.1% on the overall rotational constants. 16,61−65 As already mentioned in the Introduction, the systematic nature of the errors allows significant improvement of the rDSD/j3 results by the LRA; 24,66,67 however, the use of empirical parameters is not fully satisfactory.Therefore, an extensive benchmark of different basis sets and additional contributions was performed, which led to the selection of the cc-pVTZ-F12 basis set 68 (hereafter 3F12) and to the inclusion of core−valence (CV) correlation at the MP2 level in conjunction with the cc-pwCVTZ 69 (hereafter wC3) basis set.These choices define the new Pisa composite scheme (PCS), 70 in which each geometrical parameter (r) is obtained by combining the corresponding parameters optimized at different levels where r V2 is the geometrical parameter computed including an estimate of valence correlation energy with methods not exceeding the MP2 level (rDSD/3F12 in the present case) and Δr CV2 is the CV correction obtained from the difference between all-electron (ae) and frozen core (fc) MP2 computations in conjunction with the wC3 basis set.Several test computations have shown that PCS geometrical parameters are extremely accurate, and this will be further checked in the present context with reference to low-lying conformers of His, Trp, and Pro.Since all amino acids contain at least one nitrogen atom, 14 N nuclear quadrupole coupling constants (χ ii , with i referring to the inertia axis a, b, or c) play a non-negligible role in the accurate predictions of rotational spectra. 62,71Furthermore, the intensities of the different MW transitions are determined by the components of dipole moments (μ i ). 58,71While both dipole moments and quadrupole coupling constants can be computed with sufficient accuracy at the rDSD level, 39,72 accurate relative energies determining the conformer populations can be obtained by single-point energy evaluations on top of PCS geometries using composite wave-function methods rooted in the coupled cluster (CC) ansatz including single, double, and (perturbatively) triple excitations CCSD-(T). 73,74The final expression of the PCS energy is analogous to that of the PCS geometrical parameters, but now E V2 includes the CBS extrapolation and a further term (ΔE V ) is added to take into account valence correlation beyond the MP2 level.The CBS extrapolations appearing in both the E V2 and ΔE V terms are performed by the standard n −3 two-point formula. 75In order to allow the inclusion of those contributions, the E V2 term is evaluated at the MP2 level in place of the rDSD level employed for geometries where and with Finally In the equations above, all the energies are obtained within the fc approximation, unless the label ae (all-electron) is explicitly employed.
Finally, the ZPEs required for the computation of standard enthalpies at 0 K (ΔH 0 °) are evaluated in the framework of vibrational perturbation theory to second order (VPT2), 76−78 employing rDSD/3F12 harmonic frequencies 31 and B3/SVP anharmonic contributions, 79,80 except in the case of tryptophan, where also harmonic contributions have been obtained at the B3/SVP level.
The aim of the PCS model is to approach the accuracy of CCSD(T)+CBS+CV computations at the cost of a triple-ζ The Journal of Physical Chemistry A CCSD(T) computation and without any empirical parameter thanks to the evaluation of CBS and CV contributions by the inexpensive MP2 model.While use of smaller basis sets requires the introduction of empirical factors, 81 the success of the "cheap" family of methods 26 witnesses that this goal can be reached starting from triple-ζ basis sets, and the PCS model further improves the results.−84 The situation is different for gradient evaluations, where fast implementations are available only for DFT and MP2 (hence double-hybrids), which are, therefore, employed in the PCS model.In any case, since the dimensions of the studied molecules are small enough to allow the use of conventional approaches, all the computations have been performed with the Gaussian package. 85

■ RESULTS AND DISCUSSION
The B3/SVP equilibrium rotational constants and vibrational corrections of all of the molecules studied in the present work are collected in Table 1.It is apparent that, as mentioned in the Introduction, vibrational corrections are of the order of 1% of the corresponding rotational constants.As a consequence, they cannot be neglected when aiming at unbiased comparisons with experiments.
Before analyzing the specific targets of this work, let us consider the semirigid molecules corresponding to the side chains of histidine and tryptophan, namely, imidazole and indole.Since MW spectra are available for both molecules, a first estimate of the reliability of different methods can be obtained in the absence of the additional challenges related to backbone flexibility.The maximum and mean unsigned errors (MAX and MUE, respectively) are used together with their relative values (RMAX and RMUE) to analyze the quality of the results delivered by different computational models.
The results collected in Table 2 confirm that at most qualitative trends can be obtained by the methods usually employed for the interpretation of MW spectra (B3LYP and MP2), whereas the LRA based on rDSD/j3 computations confirms its remarkable performance.However, the new PCS approach delivers comparable results without the need for any empirical parameter besides those already present in the underlying electronic structure method.At this level, both CV correlation and vibrational corrections need to be included since they play an opposite role, but the issuing error compensation is far from being perfect.
The "soft" dihedral angles governing the conformational landscape of His and Trp belong either to the backbone LP is the nitrogen lone-pair perpendicular to the plane defined by the two amine hydrogens and the C α atom (Figure 1).Only nearly planar conformations are allowed for the carboxy moiety of all amino acids (ω = C α −C′−O−H ≈ 0°or 180°), with ω ≈   1.

The Journal of Physical Chemistry A
0°being preferred, unless the oxidryl hydrogen is involved in strong hydrogen bonds with other electronegative atoms.The c, g, s, and t labels are then used to indicate the cis, gauche, skew, and trans conformations determined by the "soft" dihedral angles in the following order: ϕ, ψ, χ 1 , and χ 2 .In the case of Pro, the ψ and ω dihedral angles retain the same definitions, but the puckering of the pyrrolidine ring must be properly defined. 88A full description of five-membered rings requires in principle two pseudorotation coordinates, the puckering amplitude (α) and the phase angle (τ), 89,90 which can be obtained from the endocyclic torsion angles χ 1 , ..., χ 5 (see Figure 1): Two tautomeric forms are possible for histidine, depending on the presence of an acidic hydrogen on N δ or N ϵ (referred to as δ and ϵ in the following).In agreement with previous computational studies, 91 the exploration of the conformational landscape of His provided several low-energy structures, most of which are stabilized by hydrogen bonds of type I (bifurcated 8 As already reported for other amino acids 39 some low-energy conformers of type III (ϕ ≈ 180°, ψ ≈ 0°, ω ≈ 180°, tct) have been found, but they can easily relax to more stable I conformers overcoming the very small energy barriers governing rotation around the ψ dihedral angle.Also the number of detectable conformers of type I is reduced by fast relaxations through nearly free rotation around the χ 1 dihedral angle.Therefore, our computations suggest that only four conformers might be detected in MW experiments.The corresponding structures are shown in Figure 1, and the main structural and energetic features are collected in Table 3 and Table S1 of the Supporting Information.Contrary to the usual situation for aliphatic amino acids, 39 conformers of type II are more stable than their type I counterparts (in spite of a less favorable orientation of the OH group in the carboxy moiety) since only the former conformers can establish a favorable interaction between the aromatic π-system of imidazole and one aminic hydrogen of the backbone. 92The different intramolecular hydrogen bonding networks ruling the relative stability of type II structures have been recently analyzed, 91 and our results confirm the main conclusion of that work.The results collected in Table 3 show that ZPEs tune the relative stability of the type I and type II conformers.
Furthermore, the difference between rDSD/3F12 and PCS relative electronic energies does not exceed 20 cm −1 and that between B3/SVP and rDSD/3F12 relative ZPEs does not exceed 10 cm −1 .The former results confirm the reliability of the rDSD/3F12 model, whereas the second result permits confident use of B3/SVP ZPEs for larger molecules, like tryptophan.
Despite the accessible relative energy of the species δIIgg and ϵIIg − g (δIIa and ϵIIb according to the nomenclature of ref 45) only the most stable species ϵIIgg − (ϵIIa according to the nomenclature of ref 45) has been detected in the experimental MW study. 45The PCS spectroscopic parameters are given in Table 4 together with their experimental counterparts.4 with the results of previous investigations 45 shows that our computational approach reduces the RMUE on rotational constants by about 1 order of magnitude (0.27% at the PCS level and 2.32% from the MP2 computations reported in ref 45), which, in absolute terms, translates into a PCS maximum error of about 4 MHz (to be compared to 27 MHz at the MP2 level).

Comparison of Table
This finding shows that PCS computations are accurate enough to assign the detected conformer on the basis of rotational constants since the differences between the computed values of different structures (see Table 1 in ref 45) are larger than the maximum error.As already mentioned, the nitrogen atoms of His (N a in the amino group of the backbone and N δ , N ϵ in the imidazole ring of the side chain; see Figure 1) give rise to quadrupole coupling constants.The results collected in Table 4 show that the experimental and computed values are in remarkable agreement.It is also noteworthy that the PCS value of the HNC α C′ dihedral angle (−21.8°) is very close to that estimated in ref 45 in order to minimize the difference between computed and experimental quadrupole coupling constants.
In agreement with previous studies, 38,93,94 the backbone of the most stable Trp conformers shows a hydrogen bond pattern of type II.Furthermore, the preferred conformation of the χ 2 dihedral angle (governing the position of the indole ring) is close to 90°(broadly referred to as g) and the most stable structures are characterized by the interaction of one aminic hydrogen of the backbone with the π-system of the phenyl (χ 1 ≈ 60°) or pyrrole (χ 1 ≈ −60°) ring of indole.The Best estimates of relative enthalpies at 0 K (ΔH 0 °) and dihedral angles (Figure 1) optimized at the PCS level are also given.The angles are in degrees, whereas all of the energetic quantities are in cm −1 .b At the rDSD/3F12 level and (in parentheses) at the B3/SVP level.c At the B3/SVP level.d Sum of columns 2, 3, 4, and 5.  1.

The Journal of Physical Chemistry A
structures of the two most stable conformers (IIgg and IIg − g or IIb+ and IIc+ according to the nomenclature of ref 38) are shown in Figure 1, while their main features are collected in Table 5 and Table S2 of the Supporting Information.It is quite apparent that the stabilizing effect of interactions involving the phenyl ring (IIgg conformer) is considerably stronger, with respect to those in which the pyrrole ring is engaged (IIg − g conformer).In analogy with the case of histidine, a single conformer (IIgg) was initially detected in MW studies.However, analysis of other isotopologues pointed out the presence of a second less abundant species.In fact, both 14 N and 15 N isotopologues could be observed for the nitrogen atoms of Trp (N a in the amino group and N ϵ in the pyrrole ring; see Figure 1).Since our computations should be sufficiently accurate to discriminate between isotopologues, we compare in Table 6 the computed and experimental values.The agreement between theory and experiment is indeed remarkable, and the errors are once again about an order of magnitude smaller than those delivered by previous computations.As a consequence, the computed rotational constants allow the unbiased assignment of the detected species, with quadrupolar coupling constants further confirming the results.Proline is the only natural amino acid whose side chain closes a (pyrrolidinic) cycle.Exhaustive conformational searches produced six low-energy conformers with two representatives each for the I, II, and III forms.However, the two species of type III are too unstable (more than 1000 cm −1 above the absolute energy minimum) to be detected in MW studies.The other four low-energy species adopt an envelope (E) arrangement with either exo-or endo-like placements of the carboxy moiety.The puckering amplitude α is always very close to 40°, and the phase angle τ is close either to 90°(E + , exo COOH) or −90°(E − , endo COOH) 88,95 (see Figure 1).All of those species have actually been detected in MW experiments, 43,44 and their structural and energetic parameters are given in Table 7 and Table S3 of the Supporting Information.
It is apparent that species of type II are significantly more stable than their counterparts of type I and that exo or endo placements of the carboxyl groups have comparable energies.The relative stabilities of the different species obtained at the rDSD/3F12 level are in fair agreement with their PCS Best estimates of relative enthalpies at 0 K (ΔH 0 °) and dihedral angles (see Figure 1) optimized at the PCS level are also given.The angles are in degrees, whereas all the energetic quantities are in cm −1 .b At the B3/SVP level.c Sum of columns 2, 3, 4, and 5.  Best estimates of relative enthalpies at 0 K (ΔH 0 °) and dihedral angles (see Figure 1) optimized at the PCS level are also given.The angles are in degrees, whereas all the energetic quantities are in cm −1 .b At the rDSD/3F12 level and (in parentheses) at the B3/SVP level.c At the B3/SVP level.d Sum of columns 2, 3, 4, and 5.

The Journal of Physical Chemistry A
counterparts (MAX = 64 cm −1 and MUE = 35 cm −1 ), which are, in turn, close (MAX = 28 cm −1 and MUE = 18 cm −1 ) to the values obtained in ref 88 employing the so-called focal point analysis (FPA).Actually, the difference between rDSD/ 3F12 and PCS relative energies is often of the same order of magnitude as the corresponding difference between B3/SVP and rDSD/3F12 harmonic ZPEs or between harmonic and anharmonic ZPEs.This finding confirms that none of these contributions can be neglected in order to reach fully converged values. 96he experimental spectroscopic parameters are compared with the computed parameters in Table 8.It is quite apparent that the presence of large-amplitude puckering of the pyrrolidine ring increases the errors of the computed rotational constants with respect to those obtained for the other amino acids, with the effect being particularly strong for the B a rotational constant of the IE + species.However, even in those circumstances, the agreement between computed and experimental rotational constants remains sufficiently good to permit the unequivocal assignment of the detected species, which is, anyway, further confirmed by quadrupolar coupling constants.

■ CONCLUSIONS
A general computational workflow aimed at the accurate description of the conformational landscape of flexible biomolecule building blocks has been applied to α-amino acids showing peculiar features such as tautomerism, ring puckering, or different aromatic rings.Accurate structures and relative energies are obtained by the new PCS model, which combines modern double-hybrid functionals and composite wave-function methods.In particular, geometries and spectroscopic parameters are obtained at the rDSD/3F12 level, whereas improved relative energies are computed by the PCS wave-function composite method.The agreement between computed and experimental results for histidine, tryptophan, and proline permits the unbiased interpretation of the latter in terms of well-defined stereoelectronic effects, with the synergism between intra-backbone and backbone−(side chain) non-covalent interactions playing a central role.
The above results, together with those of refs 39 and 42, provide a general panorama of natural α-amino acids.In more general terms, the reasonable cost and black-box implementation of the PCS model pave the way toward accurate studies of flexible prebiotic molecules containing a few dozen atoms also by nonspecialists.
Different contributions to the relative energies and Cartesian coordinates of the PCS equilibrium geometries for all the low-energy structures discussed in the main text (PDF) ■  1.

Table 1 .
Equilibrium Rotational Constants and Vibrational Corrections for the Species Detected in MW Experiments Computed at the B3/SVP Level a All of the values are given in MHz.b The lowest frequency normal mode has been left harmonic. a

Table 2 .
Comparison between the Experimental and Computed Rotational Constants of Imidazole and Indole a a All the values (except relative errors) are in MHz.b From ref 86 for imidazole and ref 87 for indole.c Includes B3/SVP vibrational corrections from Table

Table 6 .
Experimental Ground-State Rotational Constants (B a 0 , B b 0 , and B c 0 in MHz) and 14 N-Nuclear Quadrupole Coupling Constants (χ in MHz) of the Two Most Stable Tryptophan Conformers Compared with Computed Values a a Rotational constants of the 15 N isotopomers are also reported.b From ref 38.c PCS equilibrium geometries, rDSD/3F12 properties, and B3/SVP vibrational corrections from Table 1.d Fixed in the fitting.

Table 8 .
Experimental Ground-State Rotational Constants (B a 0 , B b 0 , and B c 0 in MHz) and 14 N-Nuclear Quadrupole Coupling Constants (χ in MHz) of the Four Most Stable Proline Species Compared with Computed Values From ref 44.b PCS equilibrium geometries, rDSD/3F12 properties, and B3/SVP vibrational corrections from Table a Calc.b Exp. a Calc.b Exp. a Calc.b Exp. a Calc.b a